Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

نویسندگان

  • Masaki Itagaki
  • Takako Aikawa
چکیده

A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data consists of various domains, it is difficult for an SMT system to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explores an approach to tackle this terminology translation problem for an SMT system. We propose a way to identify terminology translations from MT output and automatically swap them with user-defined translations. Our approach is simple and can be applied to any type of MT system. We call our prototype “Term Swapper.” Term Swapper allows MT users to draw on their own dictionaries without affecting any parts of the MT output except for the terminology translation(s) in question. Using an SMT system developed at Microsoft Research, called MSR-MT (Quirk, et al., (2005); Menezes & Quirk (2005)), we conducted initial experiments to investigate the coverage rate of Term Swapper and its impact on the overall quality of MT output. The results from our experiments show high coverage and positive impact on the overall MT quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Portable Knowledge Sources for Machine Translation

in this paper, we describe the acquisition iuld (Irga-nization of knowledge sources fur machine translation (MT) systems. It has heen liointed out by many users that one of the most annoying things idmtlt MT sys-terns is tim repeated occurrence of identical errors in word sense and attachment dlsambiguation. We show the limitations of a conventional user-dictionary method and explain how our ap...

متن کامل

CS224N: Investigating SMS Text Normalization using Statistical Machine Translation

In this project we explore two approaches to SMS text normalization. First we try a dictionary substitution approach used by most websites that provide such a service, and then modify it with our extension. This is followed by a statistical machine translation (MT) approach using off the shelf MT tools. We evaluate the performance of our system on three test sets from different sources and disc...

متن کامل

User Adaptation in a Hybrid MT System - Feeding User Corrections into Synchronous Grammars and System Dictionaries

In this paper we present the User Adaptation (UA) module implemented as part of a novel Hybrid MT translation system. The proposed UA module allows the user to enhance core system components such as synchronous grammars and system dictionaries at run-time. It is well-known that allowing users to modify system behavior raises the willingness to work with MT systems. However, in statistical MT sy...

متن کامل

Domain adaptation of MT systems through automatic post-editing

It is generally acknowledged that the performance of rulebased machine translation (RMBT) systems can be greatly improved through domain-specific system adaptation. To that end, RBMT users often choose to invest significant resources into the development of ad hoc MT dictionaries. In this paper, we demonstrate that comparable customization effects can be achieved automatically. One effective wa...

متن کامل

The Universitat d'Alacant hybrid machine translation system for WMT 2011

This paper describes the machine translation (MT) system developed by the Transducens Research Group, from Universitat d’Alacant, Spain, for the WMT 2011 shared translation task. We submitted a hybrid system for the Spanish–English language pair consisting of a phrase-based statistical MT system whose phrase table was enriched with bilingual phrase pairs matching transfer rules and dictionary e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008